Learning words and speech units through natural interactions
نویسندگان
چکیده
This work provides an ecological approach to learning words and speech units through natural interactions, without the need for preprogrammed linguistic knowledge in form of phonemes. Interactions such as imitation games and multimodal word learning create an initial set of words and speech units. These sets are then used to train statistical models in an unsupervised way.
منابع مشابه
Bayesian Learning of a Language Model from Continuous Speech
We propose a novel scheme to learn a language model (LM) for automatic speech recognition (ASR) directly from continuous speech. In the proposed method, we first generate phoneme lattices using an acoustic model with no linguistic constraints, then perform training over these phoneme lattices, simultaneously learning both lexical units and an LM. As a statistical framework for this learning pro...
متن کاملWord learning in a multimodal environment
We are creating human machine interfaces which let people communicate with machines using natural modalities including speech and gesture. A problem with current multimodal interfaces is that users are forced to learn the set of words and gestures which the interface understands. We report on a trainable interface which lets the user teach the system words of their choice through natural multim...
متن کاملMorpho Challenge competition 2005-2010: Evaluations and results
Morpho Challenge is an annual evaluation campaign for unsupervised morpheme analysis. In morpheme analysis, words are segmented into smaller meaningful units. This is an essential part in processing complex word forms in many large-scale natural language processing applications, such as speech recognition, information retrieval, and machine translation. The discovery of morphemes is particularl...
متن کاملUnsupervised Word Induction Using Mdl Criterion
Unsupervised learning of units (phonemes, words, phrases, etc.) is important to the design of statistical speech and NLP systems. This paper presents a general source-coding framework for inducing words from natural language text without word boundaries. An efficient search algorithm is developed to optimize the minimum description length (MDL) induction criterion. Despite some seemingly over-s...
متن کاملUnsupervised Learning of Lexical Information for Language Processing Systems
Natural language processing systems such as speech recognition and machine translation conventionally treat words as their fundamental unit of processing. However, in many cases the definition of a “word” is not obvious, such as in languages without explicit white space delimiters, in agglutinative languages, or in streams of continuous speech. This thesis attempts to answer the question of whi...
متن کامل